the shallow web
a suprising fact: when google claims it has "3.4 billion results" for your search term it is essentially lying
google will only display up to a few hundred results at most for ANY search term
and it continues to show an incorrect count until you reach the very last page
you can verify this for yourself by searching for any term, then clicking through the pages until perhaps 20/30 or so
or you can after clicking through once change the &count URL parameter to something like 800, and from there it will tell you the true result count
Sorry, Google does not serve more than 1000 results for any query. (You asked for results starting from 2000.)
if you set &count to higher than 1000, you get this error message
but usually it seems to be more like 100/200 before duplicates are shown (explained next)
In order to show you the most relevant results, we have omitted some entries very similar to the 172 already displayed.
If you like, you can repeat the search with the omitted results included.
there is also a link on the last page, you can usually click this to get perhaps up to 200 more results
in my experience these are actually duplicate results, so it's not giving you more info
a cynical observer such as myself wonders if this was introduced purely to create confusion regarding the overall result count and make the topic less communicable
like imagine someone starting a thread about this and then the first person to see it sees that message but doesn't click it and/or go to the last page AGAIN (because it sets you back to page 1 after you disable duplicates)
so if you're clicking and not changing the &count parameter it actually takes a long time and dedication to get to the "true" last page while duplicate results are enabled, i saw a conspiracy youtube video where a guy spends minutes going through this process
and some people might be turned away from explanation posts by an extra few sentences explaining this duplicate message
i feel bad faith social engineers might consider this kind of thing
perhaps it's more surprising that so few people seem to know about this, ironically i can't find much online
from what i can tell this got worse maybe around 2015 and it used to show a lot more results
i also have subjectively observed decreases in google search result quality some time in the past few years, it used to be easy to make specific queries for several words in quotes but now it hardly works
definitely felt like it got "dumber" at some point though don't remember when
i have been trying to pin down exactly what about it i find so unnerving
it does bother me that they try to hide it, and show a false number of results that are never actually accessible
it's definitely obscured intentionally, because i think most people would be at least a little bothered if they discovered this
rather than actually being directly concerning, i think this phenomenon is more illustrative of other problems google search creates that can be hard to intuitively grasp
it's like seeing with your own eyes the ocean cascading off the edge of the world, or escaping a mirror maze
practically, there might as well only be one page for every search, looking around for CTRs (click-through rates) suggests:
- about 25% of clicks are on the first result
- less than 1% of clicks are on results from the second page
if google showed every search result it would make no real difference to the number of times certain pages are accessed
we could consider the space of websites that are accessible through generic search terms like "world war 2", "restaurant in [area]", "giraffe" to be "the shallow web"
anything that requires searching the exact website name or a specific string in quotes would fall through the sieve of commonly searched for terms
it becomes impossible for a naive user to find any sites outside the shallow web without discovering website names from a non-google source
there is also nuance in this dichotomy because it can be hard to find even non-site specific terminology necessary to refine searches a lot of the time
the reality is that the majority of sites including many good ones will always be practically inaccessible without extremely specific search terms, and that sucks
i think seeing the sudden and surprising limit of just a few hundred google search results for a common search term shatters the (understandable) illusion a lot of people have that the web is infinitely large
most sites have to attract traffic through links from "human search engines" on centralised social media platforms
because of these immense network effects, such efforts can no longer be self-hosted or practically indexed by google
in the larger context of increasing centralisation on earth, i can only imagine the overall effect search engines have to be extremely anti-competitive
whichever company holds the #1 spot for search terms relevant to their niche is getting the most effective possible advertising (people who specifically searched for the term) for free, and is gaining more secret SEO points and market share every time someone clicks their page
this encourages companies to merge or operate under another organisational layer so they can occupy the top search result together, even for small businesses run by tradespeople
it's also concerning to me on observation how recent many search results seem to be, even for historical search terms, and the bias they show in favour of news sites
i have found news articles to usually be misleading and of low quality, yet they have extreme popularity because people want to know the news
there's also the question of the extent to which google manually and/or automatically blacklists/deboosts some results for political reasons (whether it's to insulate themselves or to advance their own political aims), i know that this happens but i haven't looked into it enough
i would have liked to run some sort of automated process to record this data and maybe plot some graphs, but i don't think it will be easy without using some 3rd party api
i have run into google's anti-scraping mechanisms before just by doing regular searches
i don't want to get my IP blacklisted either as it would inconvenience my life
and if we're talking alternative search engines, google has >80% market share, they clearly hold all the power here
the popular alternatives are all the same corporate garbage anyway so it wouldn't matter much if there was actual competition in this area
also i find it equal parts hilarious and disgusting to see bing slowly gaining market share after all the obnoxious attempts microsoft makes to funnel naive users into using it